PyDigger - unearthing stuff about Python

Found 1 out of 320,085. Showing 1 on page 1. Total pages: 1.

Name	Version	Summary	date
minference	0.1.5.post1	To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.	2024-08-13 09:39:09

Found 1 out of 320,085. Showing 1 on page 1. Total pages: 1.